Add a public API for copying a table_view to device array#18450
Add a public API for copying a table_view to device array#18450rapids-bot[bot] merged 27 commits intorapidsai:branch-25.06from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
bdice
left a comment
There was a problem hiding this comment.
Initial feedback attached. Thanks for your work on this!
Benchmark Resultstable_to_array[0] NVIDIA RTX 5880 Ada Generation
[1] NVIDIA RTX 5880 Ada Generation
|
|
What is the common functionality between this and |
bdice
left a comment
There was a problem hiding this comment.
A few comments -- overall this is close!
I looked over the splitting/copying APIs. This is similar to |
vuule
left a comment
There was a problem hiding this comment.
few places where the code could be simplified, otherwise looks great!
vuule
left a comment
There was a problem hiding this comment.
thank you for addressing the comments!
few more things and it should be good to go.
cpp/include/cudf/reshape.hpp
Outdated
| */ | ||
| void table_to_array(table_view const& input, | ||
| device_span<cuda::std::byte> output, | ||
| cudf::data_type output_dtype, |
There was a problem hiding this comment.
could we avoid passing this and derive this type from the column types, since we throw on mismatch anyway?
|
Thanks all! |
|
/merge |
7e555e0
into
rapidsai:branch-25.06
Contributes to #16483 by adding fast paths to `DataFrame.to_cupy` (which is called when `DataFrame.values` is called). The PR follows up #18450 to add cython bindings for `cudf::table_to_array` to pylibcudf and plumbs those changes through cudf classic. I benchmarked the fast (True) and slow (False) when the dataframe has 1, 6, 20, and 100 columns. The fast paths use `cudf::table_to_array` if the number of columns is greater than 1 and `cp.asarray` directly if the dataframe has only one column. The slow path uses a [raw python loop + assignment](https://github.com/rapidsai/cudf/blob/35d58394e7fb5a090ff3cda351403ec092476af5/python/cudf/cudf/core/frame.py#L520) to create the cupy array.  I used the median because the CUDA overhead of calling `cudf::table_to_array` is large (so there are outliers in the times). Here is a profile of calling `to_cupy` twice for both the slow and fast paths.  In the first calls, the fast path takes 7.3 ms vs 4.8 ms for the slow path. The first call to `cudf::table_to_array` is the bottleneck. But if you compare the second calls, the fast path is much faster (79 us vs 2.3ms) Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) - Matthew Roeschke (https://github.com/mroeschke) URL: #18801
Description
Contributes to #16483.
This PR adds a new libcudf API:
cudf::table_to_array, which copies data from a table_view into a preallocated column-major device array usingcub::DeviceMemcpy::Batched.The primary use case for this API is to accelerate the conversion of a cudf.DataFrame to a CuPy array when users access
DataFrame.valuesin Python.In a follow-up PR, I'll integrate this API into the cudf Python layer.